Abstract: Singing voice separation from music is a kind of speech separation and it is a big challenge in many applications. In this paper, support vector machine with tandem algorithm is proposed to estimate the singing pitch and separate the singing voice and music from music accompaniments. Detecting the pitch range of the singing voice and minimizing the spurious pitches occurring due to higher order harmonics are done by trend estimation algorithm. In tandem algorithm, the pitch is estimated first and then the multiple pitch contours and their associated time-frequency masks are obtained. Then the pitch contours are expanded according to temporal continuity. A post-processing stage is introduced to deal with the “sequential grouping” problem. Once tandem algorithm detects multiple pitch contours, the nest stage separates the singing voice by estimating the ideal binary mask (IBM), which is a binary matrix, constructed using premixed source signals. This stage employs a continuous SVM to decode an input mixture into vocal and nonvocal sections. Finally, separated voice is used to extract music from the mixture signal. The experimentation is performed using a signal containing voice and music, and the performance is evaluated using precision, recall and accuracy.

Keywords: Support vector machine, music accompaniments, Pitch, Trend estimation, Tandem algorithm.